io_uring: add write_buffer_high_watermark for io_uring sockets#44666
io_uring: add write_buffer_high_watermark for io_uring sockets#44666aburan28 wants to merge 4 commits into
Conversation
Expose ``IoUringOptions.write_buffer_high_watermark`` to bound the size of ``IoUringServerSocket::write_buf_``. Without a cap, the upper layer always sees writes as fully accepted (because the io_uring socket is async and stages bytes internally), so connection-level back-pressure -- and overload protections that depend on it, e.g. HTTP flood protection -- never engages. When the new field is set to a non-zero value, ``IoUringServerSocket::write`` returns a short write once ``write_buf_`` reaches the configured size, and ``IoUringSocketHandleImpl::write`` reports the actual bytes moved (rather than the original buffer length). The upper layer's connection-level write buffer holds the remainder, where the existing watermark mechanism applies. After the in-flight write completes and the buffer drops below the threshold, an injected Write completion is delivered so the upper layer retries. The default of 0 disables the cap and preserves the previous (uncapped) behavior. Resolves the existing TODO in ``IoUringServerSocket`` referencing the ``IntegrationTest.TestFloodUpstreamErrors`` timeout. Risk Level: Low Testing: Added unit tests for the buffer- and slice-based ``write`` overloads, including the back-pressure release on completion. Docs Changes: New field documented inline in the proto. Release Notes: Added. Platform Specific Features: io_uring (Linux >=5.11) Signed-off-by: Adam Buran <aburan28@gmail.com>
|
Hi @aburan28, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
@aburan can you please merge main to resolve the conflict? (We recognize merge conflicts on this file are a pain and are working on making it less annoying, but until that's done, this annoyance is an unfortunate part of getting a PR merged.) |
…rite-watermark # Conflicts: # changelogs/current.yaml
adisuissa
left a comment
There was a problem hiding this comment.
Thanks!
Left a high-level question, as I'm not very familiar with io_uring.
| // asynchronously. If the remote stops reading, the io_uring write operation may never complete. | ||
| // The operation is canceled and the socket is closed after the timeout. The default is 1000. | ||
| google.protobuf.UInt32Value write_timeout_ms = 4; | ||
| // High watermark in bytes for the io_uring socket's internal write buffer. Because io_uring |
There was a problem hiding this comment.
High level question:
Is this actually a bug in the way io_uring is used in Envoy?
Would it make more sense to have the same watermark threshold values that are used for the rest of Envoy's flow-control mechanism (in other words, why should io_uring be considered something different than the non-io_uring socket)?
cc @KBaichoo who may have more context.
|
Needs main merge. |
Signed-off-by: Adam Buran <aburan28@gmail.com>
Signed-off-by: Adam Buran <aburan28@gmail.com> # Conflicts: # test/common/io/io_uring_worker_impl_test.cc
Commit Message:
Expose
IoUringOptions.write_buffer_high_watermarkto bound the size ofIoUringServerSocket::write_buf_.Without a cap, the upper layer always sees writes as fully accepted (because
the io_uring socket is async and stages bytes internally), so connection-level
back-pressure — and overload protections that depend on it, e.g. HTTP flood
protection — never engages.
When the new field is set to a non-zero value,
IoUringServerSocket::writereturns a short write once
write_buf_reaches the configured size, andIoUringSocketHandleImpl::writereports the actual bytes moved (rather thanthe original buffer length). The upper layer's connection-level write buffer
holds the remainder, where the existing watermark mechanism applies. After
the in-flight write completes and the buffer drops below the threshold, an
injected
Writecompletion is delivered so the upper layer retries.The default of 0 disables the cap and preserves the previous (uncapped)
behavior. Resolves the existing TODO in
IoUringServerSocketreferencing theIntegrationTest.TestFloodUpstreamErrorstimeout.Additional Description:
This is a draft PR opened for review and discussion. The new field is opt-in
(default 0) so existing deployments are unaffected.
AI usage disclosure: Portions of the code and/or PR description were drafted
with the assistance of Claude (Anthropic). I reviewed and understand all
submitted code.
Risk Level: Low — opt-in via a new config field; default-0 path preserves the
previous (uncapped) behavior. Behavior change only kicks in when the field is
set to a non-zero value.
Testing:
Added unit tests (
IoUringWorkerImplTest.WriteBufferHighWatermarkandWriteSliceBufferHighWatermark) for bothwriteoverloads, exercising thecap, the short-write reporting via the socket handle, and the back-pressure
release (injected
Writecompletion) when the in-flight write completes.Docs Changes:
New field is documented inline in
api/envoy/extensions/network/socket_interface/v3/default_socket_interface.proto.Release Notes:
Added a
new_featuresentry inchangelogs/current.yamlunderio_uring.Platform Specific Features:
io_uring (Linux >= 5.11). No platform support change beyond the existing
io_uring build gating.
Runtime guard: N/A. The new field defaults to 0, which is the previous
behavior; the config field itself is the gate.
API Considerations:
New optional field on
IoUringOptions(
api/envoy/extensions/network/socket_interface/v3/default_socket_interface.proto).Default of 0 preserves prior behavior. No removed fields, no breaking
changes.